Finite-Sample Analysis of Proximal Gradient TD Algorithms

نویسندگان

Bo Liu

Ji Liu

Mohammad Ghavamzadeh

Sridhar Mahadevan

Marek Petrik

چکیده

In this paper, we show for the first time how gradient TD (GTD) reinforcement learning methods can be formally derived as true stochastic gradient algorithms, not with respect to their original objective functions as previously attempted, but rather using derived primal-dual saddle-point objective functions. We then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and no finite-sample analysis had been attempted. Two novel GTD algorithms are also proposed, namely projected GTD2 and GTD2-MP, which use proximal “mirror maps” to yield improved convergence guarantees and acceleration, respectively. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proximal Gradient Temporal Difference Learning Algorithms

In this paper, we describe proximal gradient temporal difference learning, which provides a principled way for designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not with respect to their original objective functions as previously attempted, but rather with respect to pri...

متن کامل

Finite Sample Analysis for TD(0) with Linear Function Approximation

TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such a result. Works that managed to obtain concentration bounds for online Temporal Difference (TD) methods analyzed modified versions of them, carefully crafted f...

متن کامل

Common Zero Points of Two Finite Families of Maximal Monotone Operators via Proximal Point Algorithms

In this work, it is presented iterative schemes for achieving to common points of the solutions set of the system of generalized mixed equilibrium problems, solutions set of the variational inequality for an inverse-strongly monotone operator, common fixed points set of two infinite sequences of relatively nonexpansive mappings and common zero points set of two finite sequences of maximal monot...

متن کامل

Image Restoration with Two-Dimensional Adaptive Filter Algorithms

Two-dimensional (TD) adaptive filtering is a technique that can be applied to many image, and signal processing applications. This paper extends the one-dimensional adaptive filter algorithms to TD structures and the novel TD adaptive filters are established. Based on this extension, the TD variable step-size normalized least mean squares (TD-VSS-NLMS), the TD-VSS affine projection algorithms (...

متن کامل

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Finite-Sample Analysis of Proximal Gradient TD Algorithms

نویسندگان

چکیده

منابع مشابه

Proximal Gradient Temporal Difference Learning Algorithms

Finite Sample Analysis for TD(0) with Linear Function Approximation

Common Zero Points of Two Finite Families of Maximal Monotone Operators via Proximal Point Algorithms

Image Restoration with Two-Dimensional Adaptive Filter Algorithms

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

عنوان ژورنال:

اشتراک گذاری